Introduction and Motivation
Attending a university course is a big decision for students, because it will have a large effect on the advancement of a person’s future, with a huge cost to both money and time, based on the above evidence, we can conclude that going to university is indeed an investment and the return on investment is very worrying.
Since everyone knows that the US is one of the most popular yet costly higher education destinations in the world, and the cost range is wide (for example: tuition fees range from $5,000 to $50,000 (£4,074-£ 40,746) per annual). Most undergraduate degrees last four years, so on average students graduate with a debt worth $132,860 (£101,505), we are interested in how good the investment in American education is.
In this report , we will look at the various factors that influence the U.S. education and whether studying at the renowned U.S. universities is really worth the investment. On thorough analysis of this we have realized there are several variables in the dataset that enabled us to assess the cost and expenditure of studying in the United States. We have tried our best to cover most of the important variables which allows us to evaluate and analyse efficiently. Since most variables have character datatypes, and a few in numeric, the visualisation scope is very limited.
Data description
It is through data collection that a business or management has the quality information that they need for further analysis, study, and research to make informed decisions.
We will perform exploratory, data review in this study on the subject of College Tuition, Diversity, and Pay Dataset.
This collection of data primarily receives all the information from various sources but originally from the US Department of Education.
The data, taken from the original pages, turned out to be very large in size, which is why the author, for the convenience of the audience and to produce informative results, filtered the data into five comma-separated documents using web scraping in R with the support of the rvest package.
Upon futher inspection, we also found out that some of the datasets have a few missing values using the visdat package. We have dealt with all of them and removed all the missing values.
Let’s look at the small snippets of the various data sets and the percentage of missing values in the variables, and get a better understanding of the variables.
Data on Diversity
The diversity_school data set comes The Chronicle of Higher Education.

- Name has data type character properties that describe the school name.
- Total Enrolment has double datatype properties representing the total number of students enrolled.
- State with character datatype representing the various states.
- Category with character properties defining the group Group / Racial / Gender.
- Enrolment with double datatype properties specifying the specific enrollment.
Data on History
The historical_tuition dataset been obtained from U.S. Department of Education, National Center for Education Statistics.

|
type
|
year
|
tuition_type
|
tuition_cost
|
|
All Institutions
|
1985
|
All Constant
|
10893
|
|
All Institutions
|
1985
|
4 Year Constant
|
12274
|
|
All Institutions
|
1985
|
2 Year Constant
|
7508
|
- type variable having character datatype describes the Type of school (All, Public, Private).
- year having character format describes the corresponding Academic year.
- tuition_type having character datatype refers to duration of the degree, Tuition Type All Constant (dollar inflation adjusted), 4 year degree constant, 2 year constant, Current to year, 4 year current, 2 year current.
- tuition_cost having double datatype represents the education fees/Tuition cost in USD.
Data on Salary Potential
Salary_potential data set origins from Payscales.

|
rank
|
name
|
state_name
|
early_career_pay
|
mid_career_pay
|
make_world_better_percent
|
stem_percent
|
|
1
|
Auburn University
|
Alabama
|
54400
|
104500
|
51
|
31
|
|
2
|
University of Alabama in Huntsville
|
Alabama
|
57500
|
103900
|
59
|
45
|
|
3
|
The University of Alabama
|
Alabama
|
52300
|
97400
|
50
|
15
|
- rank having double datatype properties defines the potential salary rank within state.
- early_career_pay having double datatype defines the estimated early career pay in USD.
- mid_career_pay having double datatype defines the estimated mid career pay in USD.
- make_world_better_percent having double datatype represents the percent of alumni who think they are making the world a better place.
-stem_percent having double datatype defines the percent of student body in science, technology, engineering and mathematics (STEM).
Data on the Tuition Cost
The Tuition cost data set is obtained from the Chronicle of Higher Education.

|
name
|
state
|
state_code
|
type
|
degree_length
|
room_and_board
|
in_state_tuition
|
in_state_total
|
out_of_state_tuition
|
out_of_state_total
|
|
Aaniiih Nakoda College
|
Montana
|
MT
|
Public
|
2 Year
|
NA
|
2380
|
2380
|
2380
|
2380
|
|
Abilene Christian University
|
Texas
|
TX
|
Private
|
4 Year
|
10350
|
34850
|
45200
|
34850
|
45200
|
|
Abraham Baldwin Agricultural College
|
Georgia
|
GA
|
Public
|
2 Year
|
8474
|
4128
|
12602
|
12550
|
21024
|
- in_state_tuition having double datatype depicts the Tuition fee for in-state residents in USD.
- out_of_state_tuition double Tuition for out-of-state residents in USD.
Data on Tuition Income
The tuition income has been obtained from the dataset Tuition Tracker and Priceconomics..

|
name
|
state
|
total_price
|
year
|
campus
|
net_cost
|
income_lvl
|
|
Piedmont International University
|
NC
|
20174
|
2016
|
On Campus
|
11475
|
0 to 30,000
|
|
Piedmont International University
|
NC
|
20174
|
2016
|
On Campus
|
11451
|
30,001 to 48,000
|
|
Piedmont International University
|
NC
|
20174
|
2016
|
On Campus
|
16229
|
48_001 to 75,000
|
- campus having character datatype depicts whether the school is On or off-campus.
- net_cost having double datatype denotes Net-cost - average actually paid after scholarship/award.
- income_lvl having character datatype gives u information about the Income bracket.
Data Structure and Cleaning Process
All the information gathered in these different data files was achieved by web scraping using the rvest library, html nodes, html url, and generating data set in tibble, mutating new columns and linking rows together later on. All the data was scraped and written in csv and later, the csv was read in to R environment using the readr function.
Analysis and Findings
One of the key reasons students chose to study in the USA is the prestige of the country for renowned higher education programs. Completing a degree from one of the best higher education programs in the world would make you stand out from peers of similar backgrounds and job experiences.
Let us examine and know the important facts about the US education system. We look at the various aspects when studying, the diversity of education, the history of education, the cost of tuition, income among people people in the United States, the different wages currently on the market, and similar exciting analyses.
The Distribution of Tuition Fees in US.
One of the key reasons students chose to study in the USA is the prestige of the country for renowned higher education programs. Completing a degree from one of the best higher education programs in the world would make you stand out from peers of similar backgrounds and job experiences.
Let us look at the distribution of the spectrum of tuition fees in the United States for residents and non-residents of the United States across the various types of institutions in the United States
Figure 5.1 displays the distribution of tuition fees for the residents and non-residents of US. Majorly the universities in the US offer two types of degree courses, which are a 2-Year degree course and a 4-Year degree courses. 2-Year degree courses are predominantly for students enrolled in a master’s degree and 4-Year degree courses are generally offered to students pursuing a bachelor’s degree. We have separated the two and plotted two different graphs as tuition fee varies majorly for the two different types of courses offered. As seen in 5.1, the red density plot depicts the tuition fee for non-residents and the histogram depicts the tuition fee for residents. We observe that the density in private universities is low as compared to the other two as this may be due to the higher expense of fees in private universities. It is clear from the above figure that both citizens and non-residents of the US enroll in for profit or public universities to pursue higher education. Another noticeable fact is that, in public universities, non-residents end up paying more than citizens of the US.
Difference in fees after any Scholarship / Awards Earned
A lot of students receive grants and scholarships of various kinds. Through this section, we will understand how the grants/scholarships are awarded to students of various income groups how much cost students end up paying across various income groups.
In the US, university courses are broadly offered on campus and off campus. Though there is only a little difference between the net_cost of both on campus and off campus courses, the total_cost varies significantly and hence we have displayed them independent of each other. In figure 5.2 we have the average cost that a student incurs across various income groups and in figure 5.3 we have the scholarships earned by students across various income groups. A point to note is that the variable net_cost has a lot of negative values. One may wonder how can the tuition paid be negative. Upon further probing we found out that a lot of universities provide financial assistance to students and extend it to their families as well. This is the reason the variable net_cost shows negative values. Also, the data for on campus courses for the year 2010 is missing. Besides we have also recoded the category of income level of 48,001 to 75,000 which earlier was 48_000 to 75,000.
In Figure 5.2 we group by the year, net cost and the income levels and summarise using the mean to arrive at the average tuition cost students pay across various income groups. It is clear from the figure that students in lower income groups tend to tae less burden of tuition fee as compared to students with higher income groups over the years.
In figure 5.3 we obtain the amount of scholarship earned by the students by finding out the difference between the total_cost an net_cost. Then we summarise using the mean to arrive at the average scholarship earned by each income group. Clearly, students in lower income groups earn more scholarships as compared to those in higher income groups over the years. An interesting point to note is that in 2017 and 2018 the income group of over 110,000 the off campus students have received more scholarships than on campus students.
Student’s Career Development among American States
After graduating, a student’s improvement can be calculated in terms of his salary growth. Let’s look at the various states, where we see the greatest change in the salary growth.
To compute the improvement in a student’s career, we calculated the percentage of increase in an individual’s salary through his career and grouped by the states and summarized using the mean to arrive at the average increase in salaries across various states and reordered them.
In figure 5.4 we can clearly see that individuals graduating from universities in the states of Wyoming, New-Mexico, California, New-York and Ohio show the highest growth in mid-career salaries and the states of Delaware, North-Dakota, Alaska, Idaho and New-Hampshire show the least growth in mid-career salaries. Though there is not a huge difference between the state offering the highest growth and the state offering the least growth, figure 5.4 is a great tool to understand a student’s career growth in different states.
Furthermore, we have created a table,table 5.5 wherein all the universities whose alumni experienced a career growth of more than 75%. We have also ranked the universities state-wise in terms of the improvement. The table enables the user to find the top universities in any state and the improvement percentage as well.
Alumni of US that Make World a Better Place to Live
Alumni serve many important positions, such as helping create and develop the brand of an organization through word-of-mouth marketing. Alumni play a significant role in promoting organizations that benefit students and their activities. Let us look at alumni from different US states who make an impact in the world.
To arrive at figure 5.6 we grouped by the state and summarized using the mean to calculate at the average percentage of alumni who think they make the world a better place.Later we plotted the circular bar chart using the mean value to colour the bars to distinguish the states appropriately. We also reordered it to clearly display the state in which students think they make the most impact and the least impact.
It is clear from figure 5.6 that more alumni from states, New-Mexico, Mississippi and Louisana think they contribute towards making greater impact in the world where as the alumni from states of Nevada, Rhode-Island and New-Jersey believe they do not contribute enough.
Diversity in Education in US
US is ranked amongst the top countries, showing cultural diversity. US has students coming from around the world. Let’s evaluate the various groups and their percentage of US university enrolment.
To calculate the percentage of US university enrolment amongst various ethnic groups, first we removed the Total Minority as it was a summation of all minority ethnic categories like ‘Asian’,‘Black’ etc. Besides we added Two Or More Races, Unknown and Non-Resident Foreign to the other category as they individually contribute very little to the enrolment data.To plot the values, we utilized a Letter Value Boxplot using the lvplot R package instead of a traditional boxplot as can be seen in figure 5.7
A letter value boxplot overcomes the shortcomings of a conventional boxplot and conveys more information in the tails or the outliers using letter values. The legend besides the letter value plot mentions certain alphabets and the colour they represent on the letter value plot.
- M represents the median and roughly 50% of the observations.
- F represents the quartiles and roughly 25% of the observations.
- E denotes roughly 12.5% of the observations.
- D denotes roughly 6.25% of the observations.
- C denotes roughly 3.13% of the observations.
- B denotes roughly 1.56% of the observations.
- A denotes roughly 0.8% of the observations.
- Z denotes roughly 0.4% of the observations.
- Y denotes roughly 0.2% of the observations.
- X denotes roughly 0.1% of the observations.
- W denotes roughly 0.05% of the observations.
It is clear from figure 5.7 that people of White race form the majority of students enrolled in American universities followed by Black race and other minorities. Native Hawaiian/Pacific Islander and American Indian/Alaska Native category of ethnic groups observe least enrolment in American universities.
Is diversity amongst students in the United States the same across all states and universities?
In this section we will investigate how accomodative are different states are of various minorities in their universities and to do so, we explored the percentage of different minorities enrolled in universities across the US.
To plot the minorities across various US states, first we calculated the enrolment percentage for all minorities in US universities and then made use of a leaflet plot in the R leaflet package to plot the minority percentages across different US states as seen in figure 5.8. This was done with the help of a geojson file which contains the geographical information. The figure or the choropleth map is interactive and hovering over individual states show the percentage of minorities enrolled in universities in those states.
From the figure it is visible that the universities in the state of Hawaii harbours the most amount of minority groups and the state of Maine harbours the least.
After looking at the states and their accommodative nature of accepting different ethnic backgrounds, we now shift our focus to look at how expensive and inexpensive universities accommodate students of different ethnic backgrounds.
To categorise universities into the economic groups,Expensive and Inexpensive, we first grouped all the universities by their name, summarized them to find the mean tuition cost and reordered them according to the top 50 colleges with the highest tuition fees and the bottom 50 colleges with the lowest tuition fees.Later, we joined the dataset showing tuition with the diversity dataset using the name of the universities and filtered out the total minorities, women and other categories to showcase the distribuition better. We also calculated the enrolment percentages for the top 50 expensive and bottom 50 inexpensive colleges and rendered two separate plots as shown below,to display the percentage of students of various ethnic groups enrolled in US universities.
In Figure 5.9 it is clearly seen that the majority of enrolment in the top 50 expensive colleges in the US is that of white ethnicity followed by Hispanic, black and Asian ethnicities which form the minority.
In Figure 5.10 it is observed that the majority of enrolment in the top 50 inexpensive colleges in the US is that of Hispanic, black and Asian population followed by white ethnic groups. The Native Hawaiian and American Indian students again form the minority amongst other groups in US universities. One of the reasons why the overwhelming enrolment of these ethnic groups in least expensive colleges could be that the majority of least expensive colleges are community colleges.
Does paying high tuition guarantee a higher salary? Which college has more potential in career growth on the basis of initial salary offered?
As we have seen from the earlier sections, students make a huge investment in tuition fees to propel their careers. Through this section we intend to find out if this investment does indeed bear the expected results?
To find out if there is any relation between tuition and career growth, we first joined the tuition and the salary datasets with the help of university names. Then we have produced a matrix scatter plot using the ggpairs function of GGALLY package.
A matrix scatter plot is a powerful tool for visualising the correlation between various variables in a dataset. As can be seen from figure 5.11 apart from the Impact variable, which is a percentage provided by students who think they are making the world a better place all the other variables show a positive correlation amongst each other. This suggests that paying a higher tuition fee does mean that a greater initial salary, mid career salary can be expected by the students. Another interesting point to note is that the presence of STEM students escalates the possibility of getting a better initial and mid career pay. STEM is an acronym for science, technology, engineering and maths. Having STEM courses in a university boosts the prospects of the university and thus the students.
There are also some values which are contradictory to the correlation. These can be found by hovering over the plot as the plot has been made interactive using the plotly package in R.
One such observation is where the initial pay is USD 81K , mid career pay is around USD 144k whereas the stem percent is only 2% and impact is 82%. Another being where tuition is just USD 7000 with above average initial pay of USD 62K and mid career pay of USD 120K.This differs from what is observed in the figure above where stem percentage and higher tuition cost aligns with higher initial and mid career salary.
After observing the correlation between tuition and initial salary, let us now look at the universities which offer the highest initial payment and check if they support the growth throughout the career.
Figure 5.12 shows the top 25 highest initial salary offering universities. It appears that the universities offering the highest initial salary does not guarantee career growth. Samuel Merritt University offers highest initial salary but a career growth of only 68.97%. Career growth is the percentage increase in the salary of an individual half way through his career as compared to the initial salary.
The highest career growth is observed in Harvard University where the mid career salary almost doubles and a career growth of 96.26% is observed. Almost all except Samuel Merritt university show a career growth of more than 75% which is a satisfactory.
Conclusion
Before exploring all the datasets at hand in greater depths, we asked ourselves the question, How valuable is the investment in American universities? After profound exploration, thorough analysis and meticulous inspection, we can safely say that we have an answer.
Firstly we observed that majority of the American population, both non residents and citizens enroll in public universities for higher education as the tuition fee is comparatively lower for public universities than private or for profit universities. Also, we noticed that non-residents pay more tuition fee in public universities than citizens of the US. Then we explored how much burden that is the tuition cost a student takes on average across various income groups. We also found out that universities also offer scholarships to eligible students with academic accolades and according to their economic backgrounds. Students with weaker economic backgrounds received more scholarships than those with better economic backgrounds there by reducing the net cost or burden on such students.
Then we investigated which states in the US offer best career growth opportunities to students by calculating the improvement in their salaries half way into their careers. Almost all the states provided a growth percentage of more than 70% to the students which contributed immensely to us arriving at the answer for our primary research question.
We also probed into how alumni from different states think that they are contributing to make the world a better place. In most states almost 50% of the alumni think that they contribute towards making the world a better place. Then we looked at the different ethnic groups which are enrolled in US universities.
We then understood that US is accommodative towards different ethnic groups who want to live the American dream. We then examined how different states accommodate the minority groups in their prestigious universities. We also studied how the most expensive and the most inexpensive colleges embrace the multi-cultural population. Finally, we scrutinized the correlation between the tuition cost, initial salary offered and the mid-career salary, all of which asserted a positive correlation. This authenticated the worthiness of the heavy investment students have to make in the form of tuition fees.
After diligent and rigorous exploration we can safely say that pursuing a university degree from the US is definitely worth every penny which a student pays for a better future.